Visual speech synthesis from 3D video
نویسندگان
چکیده
Data-driven approaches to 2D facial animation from video have achieved highly realistic results. In this paper we introduce a process for visual speech synthesis from 3D video capture to reproduce the dynamics of 3D face shape and appearance. Animation from real speech is performed by path optimisation over a graph representation of phonetically segmented captured 3D video. A novel similarity metric using a hierarchical wavelet decomposition is presented to identify transitions between 3D video frames without visual artifacts in facial shape, appearance or dynamics. Face synthesis is performed by playing back segments of the captured 3D video to accurately reproduce facial dynamics. The framework allows visual speech synthesis from captured 3D video with minimal user intervention. Results are presented for synthesis from a database of 12minutes (18000 frames) of 3D video which demonstrate highly realistic facial animation.
منابع مشابه
Merging methods of speech visualization
The author presents MASSY, the MODULAR AUDIOVISUAL SPEECH SYNTHESIZER. The system combines two approaches of visual speech synthesis. Two control models are implemented: a (data based) di-viseme model and a (rule based) dominance model where both produce control commands in a parameterized articulation space. Analogously two visualization methods are implemented: an image based (video-realistic...
متن کاملBuilding a synchronous corpus of acoustic and 3D facial marker data for adaptive audio-visual speech synthesis
We have created a synchronous corpus of acoustic and 3D facial marker data from multiple speakers for adaptive audio-visual text-tospeech synthesis. The corpus contains data from one female and two male speakers and amounts to 223 Austrian German sentences each. In this paper, we first describe the recording process, using professional audio equipment and a marker-based 3D facial motion capturi...
متن کاملAcquisition of a 3D Audio-Visual Corpus of Affective Speech
Communication between humans deeply relies on our capability of experiencing, expressing, and recognizing feelings. For this reason, research on human-machine interaction needs to focus on the recognition and simulation of emotional states, prerequisite of which is the collection of affective corpora. Currently available datasets still represent a bottleneck because of the difficulties arising ...
متن کاملVideo-realistic synthetic speech with a parametric visual speech synthesizer
The author presents a new face module for MASSY, the Modular Audiovisual Speech SYnthesizer [1]. Within this face module the system combines two approaches of visual speech synthesis. Although the articulation space is parameterized in terms of movements of the articulators, the visual synthesis is image based (video-realistic). The high-level visual speech synthesis generates a sequence of con...
متن کاملA Framework for Data-driven Video-realistic Audio-visual Speech-synthesis
In this work, we present a framework for generating a video-realistic audio-visual “Talking Head”, which can be integrated in applications as a natural Human-Computer interface where audio only is not an appropriate output channel especially in noisy environments. Our work is based on a 2D-video-frame concatenative visual synthesis and a unit-selection based Text -to-Speech system. In order to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007